Goto

Collaborating Authors

 bridging imagination and reality


Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

Neural Information Processing Systems

Sample efficiency has been one of the major challenges for deep reinforcement learning. Recently, model-based reinforcement learning has been proposed to address this challenge by performing planning on imaginary trajectories with a learned world model. However, world model learning may suffer from overfitting to training trajectories, and thus model-based value estimation and policy search will be prone to be sucked in an inferior local policy. In this paper, we propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD). It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories. We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.


Review for NeurIPS paper: Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

Neural Information Processing Systems

Weaknesses: While I find this paper reasonably thorough, I'm skeptical of the novelty. It seems the two components that differentiate it from Dreamer come from this mutual information maximization objective, which is to maximize the policy entropy and minimize the model loss. While there is an ablation showing what happens if you remove the model loss component, there is no ablation showing what happens if you remove the entropy maximization. My assumption is that the core reason for improvement is the model loss, which is not a surprising result. Doing this ablation would address this concern.


Review for NeurIPS paper: Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

Neural Information Processing Systems

The paper introduces the BIRD algorithm, a model-based RL algorithm based on differentiable planning (SVG-like). A key aspect of BIRD is a Mutual Information term in the loss function, which encourages the similarity of the imaginary data and the real observations. Reviewers generally liked this paper, even though there have been some concerns related to the extent of its novelty, especially compared to Dreamer. I summarize some of the concerns here, which should be addressed in the revised version of this work. Please refer to the reviews for more detail, and revise your paper by incorporating their comments.


Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

Neural Information Processing Systems

Sample efficiency has been one of the major challenges for deep reinforcement learning. Recently, model-based reinforcement learning has been proposed to address this challenge by performing planning on imaginary trajectories with a learned world model. However, world model learning may suffer from overfitting to training trajectories, and thus model-based value estimation and policy search will be prone to be sucked in an inferior local policy. In this paper, we propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD). It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.